Optimal Feature Selection through Search-Based Optimizer in Cross Project

نویسندگان

چکیده

Cross project defect prediction (CPDP) is a key method for estimating defect-prone modules of software products. CPDP tempting approach since it provides information about predicted defects those projects in which data are insufficient. Recent studies specifically include instructions on how to pick training from large datasets using feature selection (FS) process contributes the most end results. The classifier helps classify picked-up dataset specified classes order predict defective and non-defective classes. aim our research select optimal set features multi-class through search-based optimizer CPDP. We used explanatory type quantitative experimentation. have F1 measure as dependent variable while independent variables we KNN filter, ANN random forest ensemble (RFE) model, genetic algorithm (GA), classifiers manipulative variables. Our experiment follows 1 factor treatment (1F1T) RQ1 whereas RQ2, RQ3, RQ4, there 2 treatments (1F2T) design. first carried out analysis (EDA) know nature dataset. Then pre-processed by removing solving issues identified. During preprocessing, analyze that data; therefore, rank multiple sets info gain get maximum variation To remove noise, use ANN-filter significant results more than 40% 60% compared NN filter with base paper (all, ckloc, IG). applied i.e., best model 30% 50% instance (GIS). compare F1-measure almost 35% paper. validate Wilcoxon Cohen’s d test.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consistency-based search in feature selection

Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an “optimal” subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset a...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Fractional Similarity: Cross-Lingual Feature Selection for Search

Training data as well as supplementary data such as usagebased click behavior may abound in one search market (i.e., a particular region, domain, or language) and be much scarcer in another market. Transfer methods attempt to improve performance in these resourcescarce markets by leveraging data across markets. However, differences in feature distributions across markets can change the optimal ...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Toward Optimal Feature Selection

In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it gives us little or no additional information beyond that subsumed by the remaining features. In pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2023

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics12030514